10 research outputs found

    Case Assignment and the Complement/Adjunct Dichotomy: A Non-Configurational Constraint-Based Approach

    Get PDF
    Kasuszuweisung und Komplemente/Adjunkte-Unterscheidung: Ein nicht-konfigurationeller beschraenkung-basierter AnsatzCase Assignment and the Complement/Adjunct Dichotomy: A Non-Configurational Constraint-Based Approac

    HPSG for Slavicists

    No full text

    Verbal Negation and Complex Predicate Formation in Polish

    No full text
    this paper rely on arg-st (see Przepi'orkowski and Kup's'c (1997) and Przepi'orkowski (1996b, 1995) respectively). Verb Clusters without negation

    Towards the Annotation of Named Entities in the National Corpus of Polish

    No full text
    International audienceWe present the named entity annotation task within the on-going project of the National Corpus of Polish. To the best of our knowledge, this is the first attempt at a large-scale corpus annotation of Polish named entities. We describe the scope and the TEI-inspired hierarchy of named entities admitted for this task, as well as the TEI-conformant multi-level stand-off annotation format. We also discuss some methodological strategies including the annotation of embedded, coordinated and discontinuous names. Our annotation platform consists of two main tools interconnected by converting facilities. A rule-based natural language processing platform SProUT is used for the automatic pre-annotation of named entities, due to the previously created Polish extraction grammars adapted to the annotation task. A customizable graphical tree editor TrEd, extended to our needs, provides an ergonomic environment for manual correction of annotations. Despite some difficult cases encountered in the early annotation phase, about 2,600 named entities in 1,800 corpus sentences have presently been annotated, which allowed to validate the project methodology and tools

    Tools and Methodologies for Annotating Syntax and Named Entities in the National Corpus of Polish

    No full text
    International audienceThe on-going project aiming at the creation of the National Corpus of Polish assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the level of syntactic words and groups, and the level of named entities. We show how knowledge-based platforms Spejd and Sprout are used for the automatic pre-annotation of the corpus, and we discuss some particular problems faced during the elaboration of the syntactic grammar, which contains over 800 rules and is one of the largest chunking grammars for Polish. We also show how the tree editor TrEd has been customized for manual post-editing of annotations, and for further revision of discrepancies. Our XML format converters and customized archiving repository ensure the automatic data flow and efficient corpus file management. We believe that this environment or substantial parts of it can be reused in or adapted for other corpus annotation tasks

    The Standards\u27 Landscape Towards an Interoperability Framework

    No full text
    This document proposes an overview of the current scene towards an Interoperability Framework and acts as a reference point for the current standards that the community fosters and encourages to adopt/improve. This initiative is in close synchronization with other relevant initiatives such as CLARIN, ELRA, ISO and TEI and META-Share. The document builds on the CLARIN Standardisation Action Plan and adapts and extends it to the needs of the broader LT Community, beyond the SSH research areas including the industry. The main goal of this document is to give a practical orientation for various LT players, both commercial and academic; the main message being that a harmonized domain of language resources and technology can be achieved stepwise, but that an effort to adopt standards is necessary to overcome fragmentation. NB: This is to be intended by no means as a static, closed document, rather a dynamic one which needs to be constantly/periodically revised and updated by the community itself
    corecore